STATE  OF  THE  ART 


BROWSING  THROUGH 
TERABYTES 

Wide-area  information  servers  open  a  new  frontier  in  personal  and  corporate  information  services 

RICHARD  MARLON  STEIN 

The  Library  of  Congress  archives 
roughly  25  terabytes  in  its  collec- 
tion. To  browse  through  this  vol- 
ume on  your  own  would  be  nearly 
impossible.  Wide-area  information  serv- 
ers supply  the  means  to  achieve  this  goal 
by  providing  the  user-interface  structure 
and  underlying  information-retrieval 
protocol  necessary  to  automatically  col- . 
late,  collect,  and  integrate  diverse  data 
streams.  WAISes  can  distill  the  contents 
of  vast  archives  into  neatly  manageable 
and  browsable  folders. 

On-line  information  services,  such  as 
BIX  and  CompuServe,  attest  to  the  need 
for  this  kind  of  technology.  Information 
has  acquired  a  commodity-like  status. 
While  not  on  a  par  with  wheat,  pork  bel- 
lies, or  gold  futures,  the  information-ser- 
vice industry  fills  a  vital  role.  The  next 
phase  of  information  commerce  will  add 
WAIS  capabilities  to  existing  on-line  ser- 
vices, opening  a  new  frontier  in  personal 
and  corporate  information  services. 

Intentions  and  Goals 

Initiated  in  early  1989,  the  WAIS  engi- 
neering effort  is  spearheaded  by  Think- 
ing Machines  (Cambridge,  MA),  the 
manufacturer  of  the  Connection  Ma- 
chine, a  massively  parallel  supercom- 
puter (see  reference  1).  The  principal 
goal  of  the  research  project  is  to  demon- 
strate "how  current  technology  can  be 
used  to  open  a  market  of  information  ser- 
vices that  will  allow  a  user's  workstation 
to  act  as  librarian  and  information  col- 
lection agent  from  a  iarge  number  of 
sources."  (See  reference  2.)  WAISes  aim 
to  enhance  existing  information  services 
and  provide  a  utilitarian  mechanism  for 
the  industry. 
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information  servers  already  provide 
cnivc  w  »  travel  reserva- 

browse  ^"f^^viccsare highly 
quotes  on-line,  incsc:*  th„  hacis  of 
interactive,  charging  users  on  the  basu ol 
minutes  spent  on-line,  and  each  has  a 

Action  through  a  Promina* 1  y  c  om- 
nuter-to-computer  approach  to  remote 
information  retrieval.  By  minimizing 
human  interaction  with  a  remote  infor- 
mation server,  they  handle  requests  for 
expeditiously  and  inexpen- 
sivelv  WAlSes  also  alleviate  unneces 
complexity  by  moving  all  user  inter- 
Zon  to  the  local 

having  WAIS  software  handle  all  trans 
actions  with  the  remote  server 

On-line  servers  are  limited  m  their 
connectivity.  While  many  services ^uch 
as  BIX,  CompuServe,  and  AppleLink, 
incorporate  wide-area  network  struc- 
tures sharing  information  between  dif- 
S  serviced  is  not  a  wholly tjg 
ent  option.  This  restriction  constrains 

Sorption  commerce  and  tamp-s  the 
circulation  of  potentially  useful  idea s 
WAISes  circumvent  this  barrier  witn  a 

standard  information-exchange  protocol 
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ACTION  SUMMARYf 


The  next  phase  of  informa- 
tion commerce  will  add  wide- 
area  information  server  capa- 
bilities to  existing  on-line 
services.  WAISes  provide  the 
user-interface  structure  and 
the  underlying  information-, 
retrieval  protocol  necessary 
to  automatically  collate,  col- 
lect, and  integrate  informa- 
tion from  various  sources. 
When  these  are  implement- 
ed, you  should  be  able  to  di- 
rectly access  such  sources 
as  the  Library  of  Congress 
and  the  myriad  of  newspa- 
pers, journals,  and  books. 
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that  offers  unlimited  connectivity  and  re- 
SKl  functionality.  All  server, canap- 
«iv  the  WAIS  protocol  to  their  arcnive 
SuSureTto  conduct  information  re- 
t S  cUmimited  connectivity  also 
raS  concerns  of  security  and  privacy . 
Se  the^ext  box  "The  Right  to  Privacy 

"S£S£d  and  coherent  information 
of?o^calimportancehasvalu.Ind,vi- 

uals  and  companies  should  be  abtew 
market  their  information  to  the  widest 
Sole  audience.  Current  on-line  ser- 
ies can't  easily  accomplish  this,  since 
their  connectivity  is  restricted 

To  direct  your  information  to  the  best 
marketplace!you  could  subscribe  to  mul- 
tiole  on-line  sources  and  post  the  same 
message  on  all  of  them.  But  it  would  be 
more  Efficient  to  post  the  data  or .one 
server  and  have  the  data,  or  an  abstract 
of  it  broadcast  to  the  others.  Using  the 
WAIS  protocol,  WAISes  facilitate  this 

" SpposTt;  example,  you  have  re- 
vieweK'latest  set  ^J^- 
cessor  benchmarks,  taking  note  of  spe 
cifTc  architectural  advantages,  and  you 
w  S  to  make  this  information  available 
To  otos  The  benchmark  review  is  kep 
on  your  home  computer         J»  »g 
WAIS),  which  is  equipped  with  WAIS 
techno bey.  The  nearest  remote  WAIS  a 
tob  wiS  a  network  of  servers  also  has 
a  folder  for  RISC  microprocessors.  So 
you  make  a  posting  to  the  nearej hub 
server  that  inserts  a  pointer  to  the  review 
on  your  home  computer. 

Everyone  with  a  computer  running  the 
WA\S  user-interface  v*™*™*^ 
sent  information  to  a  server  and  receive 
compensation  for  whatever  portion  of  it 
rther  WAIS  subscribers  access.  The 
compensation  can  be  monetary,  or  you 
SXS  your  information  for  someone 

ClEven  publishers  of  books,  magazines 
newspapers,  and  music  can  participate 
and  Sit  from  WAISes.  For  example, 
hot  S  money  -uW  a  newspaper  save 
in  circulation  costs  if  you  received  the 
morSgpaper  electronically  instead  of 
DriX»  paper?  Similarly,  how  much 
Sy  cSldibookpublisher  save  if  you 
£3£d  a  new  best-selling ;  novel  elec- 
mmicallv  instead  of  at  a  bookstore . 
"SKi-l  information  delivery m, ;ex- 
nensive  and  costs  are  nsmg.  The  U  b. 
PoS Service  frequently  raises  its  fees 

and  transporting  ^onDS^^z 
al  information  transport  also  represent  a 
significant  fraction  of  transport  volume 
and  collateral  energy  consumption^ 
Movins  information  electronically  can 


result  in  enormous  savings. 

Computer  networks  such  as  Inter 
are  conduits  of  information  transport.  J01 
replace  manual  transportation  methods! 
listing  electronic  infrastructure! 

volume  of  traffic.  Plans  for  "a  national! 
nework  of  data  superhighways,  wh1Cn1 
will  be  installed  within  the  next  few! 
years,  are  under  way  (see  references  3] 

^A  principal  motivation  for  WAIS  tech| 
nology  is  to  be  able  to  retrieve  topical  m-1 
Sanation  for  research  or  investigation, 
noVjust  to  deliver  consumable  rttms  lfc 
newspapers  or  books.  Toward  this  end,! 
WAISes  rely  on  a  novel  structure  for  m- 
formation  retrieval,  the  dynarmc  folder. 

To  use  a  WAIS,  you  formulate  a  ques- 1 
tion  (see  figure  1),  find  the  informal  3I>1 
servers  that  provide  satisfactory  e- 
sDonses  and  create  a  dynamic  foka. 
TrPur'Pose  of  the  dynamic  folder  ,  to 
Snsfandy  or  periodically  upda*  «s ,  n- 
tents  with  new  material  on  the  subjec  , 
Formulating  a  question  is  natural  to  us 
all.  The  difficult  part  is  locating  *e ^er-  ] 
tinent  information  to  answer  it.  Manual- 
y locating  the  information  can  be  la-  n- 
ous  and  Tedious.  WAISes  automat, 
search-and-retrieval  process.  To  oei.r- 
m?ne  which  servers  hold  the  inform..:,  in 
most  pertinent  to  your  q^ion  and 
where  you  should  submit  dynamic  led.] 
ers,  you  may  want  to  consult  server  dx- 
rectories. 


Server  Directories 

WAIS  directories  are  servers  that  sup- 
nort  a  directory-services  function.  Tt  ~\ 
are  indexes  to  other  services  within  the  ■ 
WAIS  network  and  are  organized  to  help 
you  locate  information.  Like  telephore- 
directory  services,  WAIS  directones  .* 
poTnSo  servers,  which  are  grouped  | 
according  to  content  and  function. 

Kairelton'-enxry  header  contains  uf- 
f  icient  data  to  describe  tne  service,  such 

server,  the  parent  server  (if  the  semx* 
a  subsidiary  of  a  larger  one),  related 
Severs  contact  information  (including 
SStaind  human-interface  points), ! 

and  cost  information.  .  1 

The  local  workstation,  when  equipped  ] 
with  a  WAIS,  should  maintain  a  direc- 
entry  that  includes  the  dm W 
entry  header,  a  locally  aetermmed  rank, 
subscription  information  (if  any),  user 
coSts.  and  the  time  of  last  contact 
C^an  use  this  information  tc .decide 
whether  to  contact  the  server  and  how  to 
handle  the  responses. 

Bv  using  content  navigation,  you  can 
find  the  most  appropriate  server  to 
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5  Question-1 


Look  for  documents  about 


recent  developments  in  personel 
computers  _  
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Vhieh  are  similar  to  In  these  sources 


Question-1 

Look  for  documents  about 


recent  developments  In  personal 
computers]  


<*>  YjlISLJOurrul 

<> 

Compaq  Computer  Directors  Approve  2-for-1  Stock  Split 
International :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  Em 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  —  International  Business  Machines :  Pn 
Runners-Brief  —  Data  General  Corp.:  Four-Models  AretJr 
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•  Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca 


Compaq  Computer  Directors  Approve  2-for-l  Stock  Split  {> 
International:  Bull  Agrees  to  Pay  Zenith  $15  Million  to  Em 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  —  International  Business  Machines :  Prit 
■  Business  Brief  —  Data  General  Corp.:  Four  Models  Are  Unj 
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International  Business  Machines  Corp.,  Apple  Computer  Inc. 
end  other  big  computer  makers  ere  staking  out  positions  in 
the  nascent  market  for  "note- pad  computers,"  small  machines- 
that  let  users  enter  data  by  writing  rather  than  tapping 
keys. The  note  pads  typically  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  end  convert  them  into 
conventional  electronic  characters.  The  information  is  then 
stored  for  later  transfer  to -a  personal  computer  or  a 
company's  main  computers. 

The  size  of  the  market  for  note-pad  computers  isn't  clear, 
but  Infocorp,  a  Santa  Clare,  Calif.,  market- research  firm, 
estimates  the  market  will  grow  to  3.4  million  units  sold  in 
1  995  from  22, ODD  units  this  year.  Only  one  company,  Tandy 
Corp.'s  Grid  Systems  unit,  currently  sells  note-pad  computers 
in  the  U.S.;  its  model,  introduced  last  September,  is  priced 
et.Ss.OOO.  But  new  ventures  are  expected  to  introduce  several 
note^eB'lTiBCtii nes  this  year.  And  already,  big  computer  maker: 
"  are  fighting  quiet!  u  for  control  over  software  stsndnrds  for 
these  gadgets,  which  require  different  programs  from  those 
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handle  a  quen'.  For  example,  a  question 
on  RISC  microprocessor  benchmarks 
would  list  director}'  entries  for  servers  as 
well  as  pointers  to  articles  on  the  subject. 
When  you  retrieve  a  document,  the  di- 
rectory entry  is  also  provided.  Thus,  you 
obtain  ranking  information  for  questions 
of  similar  content. 

Each  server,  then,  contains  informa- 
tion of  value  to  certain  subscribers.  The 
dvnamic  folder  can  continuously  poll 
newspaper  servers  for  new  articles  as 
thev  arrive  from  the  news  wires,  while  it 
would  probabiv  auery  a  dictionary  or  en- 
cyclopedia server  only  once,  since  the 
content  changes  much  less  frequently. 

Policing  the  large  number  of  anticipat- 
ed servers  fin  the  tens  of  thousands)  re- 
quires an  independent  quality -control 
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mechanism.  An  audit  of  the  server  d.rec- 
tory  would  reflect  any  server  that  fre- 
quently returns  erroneous  information  or 
does  not  perform.  An  independent  agen- 
cy like  Consumer  Report,  the  Better 
Business  Bureau,  or  other  watchdog 
groups  could  create  rating  servers,  which 
monitor  and  rate  other  servers  m  the 

directory.  . , 

These  rating  servers  resemble  movie 
and  TV  critics':  Consumers  acquire  con- 
fidence in  the  reports  and  reviews  that 
certain  critics  issue  because  they  share 
similar  tastes.  Just  as  moviegoers  start 
to  trust  a' particular  reviewer  who  has 
agreed  with  them  on  past  movies.  WAlb 
users  will  begin  to  trust  the  specific  rat- 
in"  services  that  agree  with  them. 

A  subscriber  base  generates  income 


for  a  server.  The  rating  servers  will  i  . 
tract  subscribers  as  well,  for  they  direct 
trends  in  the  information  marketplace  In 
fact,  they  may  become  the  first  infor- 
mation speculators"  as  a  by-product  of 
WAIS  technology.  I 

Dvnamic  Folders  . 

Afolder.  like  those  found  on  the  Macin- 
tosh provides  the  WAIS  framework  for 
organizing  questions.  A  folder  is  a  re- 
pository for  documents.  A  file  system l  ib 
the  Macintosh  sense,  is  full  of  folders! 
organized  in  a  tree  structure  mat  sup- 
ports an  efficient  document-location 

mechanism. 

To  find  a  document  within  a  file  sys- 
tem, vou  typically  use  the  find  com- 
mand under  Unix  or  Finder  on  the  Mac. 
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Technoioqu:  Computer  Firms  See  the  Writing 


Computer  makers  ere  scrambling  to  cash  in  on  people  who 
find  the  pen  mightier  then  the  keyboerd. 

International  Business  Machines  Corp.,  Apple  Computer  Inc. 
fend  other  big  computer  mokers  ere  stoking  out  positions  in 
pe  nascent  merket  for  "note-ped  computers,"  smell  mechines 
Bt  let  users  enter  dete  by  writing  rether  then  tepping 
bis.  The  note  pads  typically  recognize  numbers  and  letters 
cited  on  a  screen  with  e  speciBl  pen  end  convert  them  into 
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Figure  2-  The  similar  to  function  lets  you  retrieve  more  documents  on  notepad 
computers  using  relevance  feedback.  You  then  might  initiate  a  search  for  additions 
documents  with  similar  content.  Selecting  text  from  a  section  of  a  retrieved  docume 
helps  to  refine  subject-matter  searches  or  locate  collateral  information.  You  can  al, 
use  the  selected  text  to  execute  a  new  query.  (Courtesy  of  Thinking  Machines  Corp. 


With  one  of  these  tools,  you  can  locate 
the  position  of  a  file  and  gain  access  to  its 
contents.  Path-driven  locators  search  an 
information  base  for  a  document's  name, 
but  they  do  not  provide  a  means  to  exam- 
ine its  contents. 

Retrieving  documents  pertinent  to  a 
specific  question  requires  content  navi- 
gation (i.e.,  examining  the  contents  of  a 
document,  or  a  representative  abstract  or 
index  for  the  document,  for  its  relevance 
to  the  question).  The  similarity  between 
the  question  and  the  document's  index 
determines  a  retrieval  score,  an  indica- 
tion of  the  likelihood  that  the  document 
is  pertinent. 

WAISes  rely  on  the  dynamic  folder  to 
encapsulate  a. question.  In  its  most  pas- 
sive form,  it  contains  a  question  and  a  set 
of  servers  to  target.  The  WAIS  posts  the 
dynamic  folder  to  servers  of  known  qual- 
ity and  functionality,  and  then  query 
processing  begins. 

The  dynamic  folder  executes  a  remote 
query  that  sends  questions  to  the  remote 
servers.  There  the  questions  find  rele- 
vant information  and  return  a  list  of  doc- 
ument titles  (document  pointers)  encap- 
sulated within  the  originating  folder  to 
the  local  WAIS  svstem.  The  results  from 


the  query  may  initially  include  a  list 
documents  with  fair,  good,  or  hi 
similarities. 

Now  you  can  refine  your  query  strs 
gy  by  perusing  the  document  titles  to 
termine  which  are  the  most  appropri 
documents.  WAIS  technology,  in 
form  of  the  WAIStation  user  interf 
(see  reference  5),  assists  this  proc 
through  a  content-associativity  funct 
known  as  similar  to. 

The  similar-  to  function  informs 
WAIS  user  interface  that  a  document  is 
"interesting."  The  server  uses  this  infor- 
mation to  find  other  documents  that  are 
similar  to  the  one  you  have  chosen.  This 
search  strategy,  an  embedded  compo- 
nent of  WAISes,  represents  a  significant 
improvement  over  traditional  database 
methods,  such  as  Structured  Query  Lan- 
guage (SQL)  and  Boolean  search. 

This  form  of  query  execution  is  known 
as  relevance  feedback.  It  lets  you  extend 
the  query  to  incorporate  a  "more-like- 
that-one"  functionality  and  lets  you  re- 
trieve documents  that  have  similar  con- 
tents. The  WAIS  user  interface  is 
organized  around  the  English  language, 
and  English-language-oriented  query 
structures  are  easier  to  use  than  SQL. 
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The  similar  to  function  is  like  work- 
ing with  a  reference  librarian.  First,  you 
state  the  topic  of  your  research,  which  the 
librarian  translates  into  queries.  After 
you  examine  the  results  of  the  queries, 
you  indicate  which  results  were  on  the 
mark;  thus,  the  librarian  gains  a  better 
understanding  of  your  needs  and  can  im- 
prove the  search. 

With  relevance  feedback,  WAlSes  can 
retrieve  documents  with  greater  ease  and 
speed.  You  no  longer  need  to  alter  a  SQL 
Boolean  operator  to  adjust  the  query  fil- 
ter; instead,  you  can  ask  for  "more  docu- 
ments like  this  one." 

Dynamic  folders  can  also  possess  vi- 
tality, which  gives  the  folder  a  continu- 
ous charter  to  execute  queries  periodical- 
ly and  update  its  contents  with  new 
material.  A  folder's  charter  expresses 
purpose,  intent,  and  the  goal  that  you 
want  the  query  to  accomplish.  You  can 
build  the  folder  to  periodically  poll  serv- 
ers known  to  receive  frequently  updated 
material  that  matches  its  charter. 

If  the  search  retrieves  an  interesting 
document,  WAISes  let  you  select  a  por- 
tion of  the  text  and  use  it  as  an  adjunct  to 
the  initial  query.  Selecting  text  from  a 
portion  of  a  document  that  may  contain 
some  particularly  topical  or  relevant  in- 
formation and  using  it  to  refine  the 
search  is  an  innovative  approach  for  ex- 
ploring subjects  (see  figure  2). 

WAISes  also  let  you  chain  questions  by 
taking  the  results  of  a  previous  search, 
starting  a  new  question  with  different 
subject  matter,  and  dragging  the  previ- 
ous results  into  the  similar  to  menu  box 
(see  figure  3).  Chaining  questions  can 
either  broaden  or  narrow  a  search,  de- 
pending on  the  relevance-feedback  re- 
sults. 

The  recursive  capacity  of  dynamic 
folders  to  initiate  "sibling"  folders  dem- 
onstrates the  WAIS  potential  to  harness 
and  refine  subject  matter.  Query  refine- 
ment alters  the  charter  of  a  dynamic 
folder.  Sibling  dynamic  folders  execute 
directed  searches  and  can  have  an  auton- 
omous authority  to  broaden  the  range  of 
server  choices. 

Controlling  the  extent  of  search  expan- 
sion is  a  critical  issue.  For  individuals, 
cost  can  be  an  overwhelming  concern. 
WAIS  technology  does  not  yet  contain  an 
accounting  system  to  govern  search  crite- 
ria. Participating  information  services 
will  have  to  engineer  this  eiemem  of  the 
technology  themselves. 

WAIS  Protocol 

WAISes  promote  connectivity  and  access 
to  remote  electronic-information  sources 
through  a  standard  protocol,  the  WAIS 
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protocol.  This  protocol  is  an  extension  of 
the  National  Information  Standards  Or- 
ganization (NISO)  Z39.50-1988  specifi- 
cation, which  defines  an  interface  to 
remote  information-retrieval  services 


and  library -protocol  applications.  The 
239.50  standard  is  the  backbone  of  the 
WAIS  protocol  and  the  foundation  for 
WAIS  applications  development. 

Incorporating  the  Z39.50  standard 
into  the  WAIS  protocol  frees  developers 
to  build  articulated  user  interfaces  for 
WAIS  applications.  The  interface  stan- 
dard isolates  the  server's  text-retrieval 
method,  such  as  SQL,  giving  the  applica- 
tion a  transparent  access  mode.  The  par- 
ticulars of  database  queries  are  hidden 
beneath  the  interface.  A  developer  only 
needs  to  be  sure  that  the  server  possesses 
an  equivalent  functionality  to  conduct 
remote  information-retrieval  transac- 
tions from  a  local  WAIS  workstation. 

Concealing  the  server's  implementa- 
tion through  the  WAIS  protocol  is  impor- 
tant in  another  respect  as  well.  Isolating 
the  implementation  implies  that  you  can 
specify  a  single,  more  palatable  query 
language.  The  WAIS  protocol  also  lets 
you  use  an  English-language-style  query 


Figure  3:  Chaining  questions  permits  you  to  use  a  query  on  multiple  information 
sources  by  opening  a  new  question  and  dragging  previous  query  results  into  the 
similar  zo  field.  You  can  also  apply  the  similar  -o  operation  to  invoke  a  new 
document  search,  as  in  this  example.  (Courtesx  of  Thinking  Machines  Corp. 


MAY  199}  ■  EVTE  1B3 


lexicon  instead  of  cryptic  SQL  or  fourth- 
generation  languages.  When  you  find  a 
document  that  is  appropriate,  the  WAlb 
protocol  automatically  handles  the 
download  process  from  the  server.  This 
is  quite  different  from  existing  services, 
where  manual  file-capture  mechanisms 
require  vigilance.  With  the  WAIS  proto- 
col, all  documents  look  like  they  are 
local  to  your  system. 

The  WAIS  protocol  incorporates  two 
important  modifications  that  the  NISO 
Z39  50  standard  does  not  address.  First, 
it  "permits  hypermedia  document  trans- 
port Most  documents  today  are  corn- 


easily  return  to  the  document  source  in- 
stead of  making  copies. 

The  WAIS  protocol  is  designed  to 
transport  information  through  modems, 
X  75  communications,  or  network  back- 
bones. This  flexibility  provides  an  enor- 
mous framework  within  which  to  con- 
duct retrieval  transactions.  For  example, 
with  a  portable  computer,  you  could  con- 
nect with  a  WAIS  hub  through  a  modem 
and  post  dynamic  folders,  directing  the 
query  results  to  be  routed  to  your  office 
system  for  later  examination. 
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posed  primarily  of  ASCII  text  codes  and 
sequences,  but  the  next  generation  of 
documents,  constructed  from  hyperme- 
dia and  multimedia  sources,  integrates 
imases  and  fullv  formatted  text.  These 
media  forms  are  rapidly  becoming  popu- 
lar and  conventional. 

Second,  the  WAIS  protocol  is  stateless 
for  the  server.  It  does  not  have  to  keep 
anv  information  about  the  client  between 
transactions,  because  the  user's  state  is 
kept  on  the  local  workstation.  Every 
search  or  retrieval  operation  is  a  separate 
process.  The  contexts  are  decoupled 
under  the  statelessness  of  the  protocol. 
This  decoupling  lets  you  make  a  search, 
store  away  the  document  pointer,  and  re- 
trieve it  later.  . 

Further,  you  can  use  a  dynamic  folder 
to  pass  one  of  these  document  pointers  to 
someone  else  who  can  also  retrieve  the 
document.  A  document  pointer  is  like  an 
International  Standard  Book  Number  for 
,.•  the  electronic  age.  (The  ISBN  is  a  unique 
identification  assigned  to  each  publica- 
tion.) Passing  a  document  pointer  con- 
forms with  copyright  law  and  lets  you 


Retrieval  Technology 

The  computing  infrastructure  needed  to 
implement  WAISes  varies  with  a  server  s 
functionality.  A  Library  of  Congress 
WAIS,  with  25  terabytes  of  data,  could 
not  expeditiously  dispatch  queries  and 
function  if  a  serial  computer  were  used  to 
process  the  information.  For  a  problem 
of  this  magnitude,  massive  parallelism  is 
needed.  The  Connection  Machine  s 
Text-Retrieval  System  is  a  viable  infor- 
mation-retrieval system  for  gigabyte-size 

databases.  _ 

The  DowQuest  service  from  Dow 
Jones  runs  on  the  Connection  Machine. 
The  service  incorporates  approximately 
1  gigabyte  of  original  text  derived  from 
over  400  sources.  The  Wall  Street  Jour- 
nal, the  Washington  Post,  Barron  s  For- 
tune, Forbes,  and  several  regional  busi- 
ness and  technical  journals  are  includ- 
ed, covering  the  previous  eight  calendar 
months.  The  search  time  with  a  100- 
word  query  composed  of  typed  English 
and  relevance  feedback  (e.g.,  "more  like 
that  one")  is  less  than  half  a  second.  The 
system  can  provide  access  to  many  giga- 
bytes of  text  and  to  thousands  of  users 
interactively. 

The  projections  for  the  Connection 
Machine  system  indicate  that  when  it  is 
scaled  to  a  1-terabyte  database  with  10- 
word  queries,  obtaining  an  answer  with- 
in 10  seconds  or  less  is  highly  probab le. 
This  performance  is  accomplished  by 
harnessing  the  Connection  Machine  s 
65  536  separate  processors  to  execute  a 
parallel  index  algorithm  (see  reference 
6)  These  estimates  are  phenomenal  and 
trulv  indicative  of  the  computing  power 
manifest  in  parallel  systems.  No  sena 
machine  can  even  come  close  to  this  level 
of  performance. 

The  Connection  Machine  system  gen- 
erates these  results  by  searching  the  en- 
tire contents  of  an  archive,  not  a  repre- 
sentative abstract  of  a  keyword  f  requenc;. 
table.  Each  document  within  tne  archive 
is  used  to  determine  a  match.  This  is  not 
typical  for  systems  organized  around 
serial  computers,  and  it  is  another  dra- 


matic demonstration  of  parallel-comput- 
ing technology. 

The  cost  of  a  system  like  the  Connec- 
tion Machine  runs  in  the  millions  of  dol- 
lars But  a  Macintosh  with  a  100-mega- 
byte  hard  disk  drive  or  a  386-based  PC 
can  serve  the  typical  WAIS  user. 

Immense  Promise 

The  prototype  WAIS  user  interface  and 
protocol  are  currently  being  beta-tested 
at  Thinking  Machines,  Apple  Computer, 
and  Dow  Jones  News/Retrieval.  Think- 
ing Machines,  the  principal  developer  oi 
the  WAIS  architecture  and  software, 
plans  to  share  the  WAIS  protocol  free  o 
charge  and  hopes  to  help  user-interfac 
developers  build  interfaces  to  WAT 
servers . 

While  still  a  research  project  that 
undergoing  development  and  ref  inemen 
the  WAIS  holds  immense  promise .  Info  - 
mation  commerce,  buoyed  through  :;ie 
widespread  acceptance  of  computer  vS- 
tems  and  networks,  forces  individuals 
and  companies  to  expedite  transactions 
and  simplify  activities.  These  coveted 
sources  of  efficiency  stand  out  as  promi- 
nent allies  of  competitive  advantage  ■ 
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